Use runs-on GPU runners for CI by dfalbel · Pull Request #1439 · mlverse/torch

dfalbel · 2026-04-15T12:10:08Z

Summary

Replace [self-hosted, gpu-local] runners with runs-on g4dn.xlarge (T4 GPU) instances for the test-gpu and test-cudatoolkit jobs
Uses the ubuntu24-gpu-x64 image which includes pre-installed NVIDIA drivers and container toolkit
Docker container setup (--gpus all --runtime=nvidia) is preserved

Test plan

Verify test-gpu job runs successfully on runs-on runner
Verify test-cudatoolkit job runs successfully on runs-on runner
Confirm GPU is accessible inside the container (nvidia-smi)

…lkit job

* Use runs-on GPU runners for CI Replace self-hosted GPU runners with runs-on g4dn.xlarge spot instances, matching the approach in mlverse/torch#1439. Also modernizes the workflow: - Action versions: checkout@v4, setup-python@v5, setup-r@v2, etc. - Fix deprecated ::set-output → $GITHUB_OUTPUT - Container: ubuntu18.04 → ubuntu20.04 (18.04 is EOL) - Add --runtime=nvidia to container options - Add concurrency groups with cancel-in-progress - Simplify matrix to single config (CUDA 11.2.1, cuML 21.12, R release) - Drop ASAN matrix dimension * Revert container to ubuntu18.04 for CUDA 11.2 compatibility * Use CUDA 11.2.2 container (11.2.1 removed from Docker Hub) * Bump container to ubuntu20.04 (18.04 glibc too old for Node 20 actions) * Split CI into build-image (free runner) and test-gpu (GPU runner) - Build Docker image with cuda.ml pre-installed on ubuntu-latest (free) - Run tests on runs-on g4dn.xlarge GPU runner using the pre-built image - Add .github/docker/Dockerfile following the same pattern as mlverse/torch - Make CMAKE_CUDA_ARCHITECTURES configurable via env var (defaults to NATIVE) so cross-compilation works on runners without a GPU (targets T4 = SM 75) - Remove miniconda install (no longer needed for reticulate tests) * Fix sklearn install: use scikit-learn package name and py_require() The 'sklearn' PyPI package is deprecated in favor of 'scikit-learn'. Also switch from py_install() to py_require() which is the modern reticulate API for declaring Python dependencies. * Fix configure warnings: normalizePath ordering and cmake unused variable - Move download_libcuml() before normalizePath() so the directory exists - Reference CUML_STUB_HEADERS_DIR in both Treelite found/not-found branches so cmake doesn't warn about unused variable * Fix TSVD tests for SVD sign ambiguity between cuML and sklearn SVD components are only defined up to sign, so different implementations can produce sign-flipped vectors that are mathematically equivalent. Align signs before comparing components and transformed data. * Fix sklearn max_iter type: use integer (10000L) not float (10000.0) Modern sklearn strictly validates that max_iter is an int. R's default numeric type is double, which reticulate passes as a Python float. Using 10000L ensures it's passed as a Python int. * Add CRAN-like check job (no CUDA, stub headers, ubuntu-latest) Runs R CMD check --as-cran on ubuntu-latest with R release and devel. No nvcc/CUDA available, so the package builds with stub headers — matching what CRAN would see. * Update roxygen * export S3 methods * roxygen updates * Fix CRAN check: escape Rd braces, skip tests without cuML - Escape literal braces in roxygen comments across R source files and templates (e.g. {cuda.ml} -> \{cuda.ml\}, {"opt1",...} -> \{"opt1",...\}) - Regenerate all affected Rd files via devtools::document() - Skip test_check() when cuML is not linked (CRAN-like environments) - Use R CMD check directly in CRAN job (avoids rcmdcheck NOT_CRAN=true) * Fix examples brace escaping and register S3 methods - Revert brace escaping inside @examples blocks (R code, not Rd markup) - Define cuda_ml_can_predict_class_probabilities methods as proper functions so roxygen registers them as S3method() in NAMESPACE * Add RAPIDS cuML 26.04 + CUDA 12 support Build infrastructure: - Dockerfile: CUDA 12.8.1 + Ubuntu 22.04 base image - libcuml_versions.R: add 26.04 entry pointing to PyPI libcuml-cu12 wheel - cuml.R: handle pip wheel extraction (lib64/ layout, .whl extension) - configure.R: handle lib64/ vs lib/ for pip wheels - CMakeLists.txt.in: C++17, rapids-cmake branch-26.04 - Workflow: target cuML 26.04 C++ API changes for cuML 26.04: - svm_serde.h: namespace alias MLCommon::Matrix -> ML::matrix for KernelParams and KernelType (header renamed kernelparams.h -> kernel_params.hpp) - fil.cu, fil_utils.h, fil_utils.cu: disable FIL on 26.04 with stubs (fil.h replaced by modular headers; full adaptation TODO) - random_projection.cu: disable on 26.04 with stubs (C++ API removed) - knn.cu: disable on 26.04 with stubs (raft::spatial::knn types removed) - random_forest_classifier.cu, random_forest_regressor.cu: guard FIL prediction paths for 26.04 Backward compatible: cuML 21.x with CUDA 11 still works. * Test both cuML 21.12 and 26.04 in CI - Dockerfile: accept CUDA_IMAGE as build arg for different base images - Workflow: matrix over cuML 21.12 (CUDA 11.2) and 26.04 (CUDA 12.8) - Each version gets its own build-image and test-gpu job * Fix rapids-cmake version and lib symlink for dual cuML support - CMakeLists.txt.in: template RAPIDS_CMAKE_TAG and CMAKE_CXX_STANDARD so they adapt to the cuML version being built against - configure.R: set rapids-cmake tag (v26.04.00 for 26.x, branch-21.10 for 21.x) and C++ standard (17 for 26.x, 14 for 21.x) - cuml.R: don't create premature lib symlink in download_libcuml() * Derive rapids-cmake tag from cuML version instead of hardcoding Use vYY.MM.00 for cuML >= 23.02 (stable tags), vYY.MM.00a for older versions (only alpha tags available). * Require cmake 3.30.4+ for cuML 26.04 (auto-downloaded if missing) rapids-cmake v26.04 needs cmake >= 3.30.4. The existing auto-download logic handles this, but the min version threshold was hardcoded to 3.21.1. Now it's 3.30.4 for cuML >= 23.02, 3.21.1 for older versions. * Fix cuML 26.04 build: raft/rmm deps, static_assert, device_allocator - Download libraft-cu12 and librmm-cu12 wheels alongside libcuml-cu12 (cuml headers include raft/rmm headers which are in separate packages) - Merge raft/rmm headers into libcuml/include/ during download - Remove static_assert(CUML_VERSION_MAJOR == 21) — allow 26+ - Guard raft::mr::device::allocator (removed in raft 26.x) with version conditionals in device_allocator.cu/.h and stream_allocator.cu - Use raft/core/handle.hpp instead of raft/handle.hpp for v26+ * Resolve cuML PyPI deps dynamically instead of hardcoding URLs - Add tools/config/utils/pypi.R with resolve_native_deps() that walks the PyPI dependency tree for a package and returns download URLs for all native C++ dependencies (libraft, librmm, rapids-logger, cccl, etc.) - libcuml_versions.R: cuML 26.04 entry is now just "libcuml-cu12" (the PyPI package name), not a hardcoded URL - cuml.R: download_libcuml() detects PyPI package names vs direct URLs, resolves the full dep tree, downloads all wheels, and merges their include/ directories into libcuml/include/ - configure.R: load pypi.R utility - Uses jsonlite for PyPI JSON API parsing * Download CCCL 3.3 headers for cuML 26.04 builds RMM 26.04 headers require CCCL >= 3.3 at compile time, but CCCL is not a pip dependency (it's normally bundled with the CUDA toolkit). CUDA 12.x ships CCCL 2.x which is too old. Download CCCL v3.3.0 from GitHub releases (header-only, ~2MB) and merge into libcuml/include/. Also handle pip wheels that extract to nested dirs like nvidia/<subpackage>/include/. * Put CUML_INCLUDE_DIR before CUDA toolkit includes CCCL 3.3 headers (bundled in libcuml/include/) must take precedence over the CUDA 12 toolkit's older CCCL 2.x headers. Swap include order so cuml/raft/rmm/cccl headers are found first. * Fix CCCL compat, pinned_allocator removal, and raft handle API - Use RAPIDS-pinned CCCL commit (CUDA 12 compatible) instead of v3.3.0 release tag which includes CUDA 13-only code - pinned_host_vector.h: guard thrust::cuda::experimental::pinned_allocator (removed in CCCL 3.x); use plain host_vector on v26+ - handle_utils.cu: raft::handle_t no longer has set_stream(); reconstruct with stream_view via constructor on v26+ * Switch to cuML 25.12 (no CCCL 3.x requirement) cuML 26.04's rmm headers require CCCL >= 3.3 which conflicts with CUDA 12.x toolkit's CCCL 2.x. cuML 25.12 vendors its own CCCL in librmm/include/rapids/ and has no CCCL version check — clean CUDA 12 compatibility. - Target cuML 25.12 instead of 26.04 - Version guards: >= 26 -> >= 25 (same API changes apply) - Re-enable KNN (knn.hpp exists in 25.12 with same API) - Remove CCCL GitHub download (not needed) - Update PyPI resolver to handle version pins (==25.12.*) * Define LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE for RMM RMM headers require this define (normally set automatically by RMM's cmake config, but we're using headers directly from the pip wheel). * Revert cuML 25.x/26.x support (CCCL 3.x incompatible with CUDA 12) All RAPIDS 25.x+ pip wheels require CCCL 3.x headers which are incompatible with CUDA 12's bundled CCCL 2.x. No version of libcuml-cu12 can be compiled against a stock CUDA 12 toolkit. Revert to cuML 21.12 as the default for now. Supporting newer cuML will require either CUDA 13 or a custom build environment. --------- Co-authored-by: Tomasz Kalinowski <kalinowskit@gmail.com>

dfalbel mentioned this pull request Apr 28, 2026

Use runs-on GPU runners for CI mlverse/cuda.ml#212

Merged

3 tasks

dfalbel added 9 commits April 28, 2026 16:13

Use runs-on GPU runners instead of self-hosted for GPU CI jobs

3854979

Serialize GPU jobs to respect single-instance AWS quota

b50603d

Add concurrency group to GPU jobs to prevent parallel instance creation

6359a6a

Add skip_slow_tests() helper and skip cuda_memory_snapshot in cudatoo…

9845b1a

…lkit job

Mark autocast grad scaler training loop test as slow

c03003c

Disable cuda_record_memory_history after snapshot test via withr::defer

857b717

Add retry=false to GPU runner labels to always use spot instances

708d287

Explicitly set spot=true on GPU runner labels

b388619

Keep spot=true but remove retry=false to fix on-demand fallback issue

f5ca9fb

dfalbel force-pushed the runs-on branch from 7ff753a to f5ca9fb Compare April 28, 2026 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use runs-on GPU runners for CI#1439

Use runs-on GPU runners for CI#1439
dfalbel wants to merge 9 commits intomainfrom
runs-on

dfalbel commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dfalbel commented Apr 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant